Discovering Informative Patterns and Data Cleaning

نویسندگان

  • Isabelle Guyon
  • Nada Matic
  • Vladimir Vapnik
چکیده

We present a method for discovering informative patterns from data. With this method, large databases can be reduced to only a few representative data entries. Our framework encompasses also methods for cleaning databases containing corrupted data. Both on-line and off-line algorithms are proposed and experimentally checked on databases of handwritten images. The generality of the framework makes it an attractive candidate for new applications in knowledge discovery. Keywordsknowledge discovery, machine learning, informative patterns, data cleaning, information gain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

General Frameworks for Combined Mining: Discovering Informative Knowledge in Complex Data

Enterprise data mining applications such as mining government service data often involve multiple large heterogeneous data sources, user preferences and business impact. Business people expect data mining deliverables to inform direct business decision-making actions. In such situations, a single method or one-step mining is often limited in discovering informative knowledge. It would also be v...

متن کامل

Identifying Informative Web Content Blocks using Web Page Segmentation

Information Extraction has become an important task for discovering useful knowledge or information from the Web. A crawler system, which gathers the information from the Web, is one of the fundamental necessities of Information Extraction. A search engine uses a crawler to crawl and index web pages. Search engine takes into account only the informative content for indexing. In addition to info...

متن کامل

A Survey on Preprocessing Methods for Web Usage Data

World Wide Web is a huge repository of web pages and links. It provides abundance of information for the Internet users. The growth of web is tremendous as approximately one million pages are added daily. Users’ accesses are recorded in web logs. Because of the tremendous usage of web, the web log files are growing at a faster rate and the size is becoming huge. Web data mining is the applicati...

متن کامل

Effective web log mining and online navigational pattern prediction

The web has become the world's largest repository of knowledge. Web usage mining is the process of discovering knowledge from the interactions generated by the user in the form of access logs, cookies, and user sessions data. Web Mining consists of three different categories, namely Web Content Mining, Web Structure Mining, and Web Usage Mining (is the process of discovering knowledge from the ...

متن کامل

Combined Mining Approach to Generate Patterns for Complex Data

In Data mining applications, which often involve complex data like multiple heterogeneous data sources, user preferences, decision-making actions and business impacts etc., the complete useful information cannot be obtained by using single data mining method in the form of informative patterns as that would consume more time and space, if and only if it is possible to join large relevant data s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994